ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / tsql / doc / tsql.mail / 000040_csj@iesd.auc.dk _Tue Mar 16 23:39:48 1993.msg < prev next >

Wrap

Internet Message Format | 1996-01-31 | 7KB

Received: from iesd.auc.dk by optima.cs.arizona.edu (5.65c/15) via SMTP id AA27274; Tue, 16 Mar 1993 15:39:32 MST Received: from yellow.iesd.auc.dk by iesd.auc.dk with SMTP id AA07007 (5.65c8/IDA-1.5/MD for <tsql@cs.arizona.edu>); Tue, 16 Mar 1993 23:39:48 +0100 Date: Tue, 16 Mar 1993 23:39:48 +0100 From: "Christian S. Jensen" <csj@iesd.auc.dk> Message-Id: <199303162239.AA07007@iesd.auc.dk> To: tsql@cs.arizona.edu Subject: Re: Benchmark initiative Jim, Al, and Alex, I hope you all survived the worst snow storm in 100 years :-) Thanks for your interest in the TSQL Benchmark. I have read your posting to the tsql list with great interest. I agree with your observations, and I thank you for taking the time to communicate them. Let me now attempt to address your concerns within the context of the benchmark effort. Some of your concerns can be met now, and the rest may need to await a later edition of the benchmark. >From info-tsql-sender@cs.arizona.edu Tue Mar 16 20:42:45 1993 > >We would like to make two comments on the proposal to develop a >comprehensive set of natural language queries as a test of "goodness" >of various query languages and algebras. In the initial proposal for a database schema for the benchmark (sent to tsql a few days ago), I included this characterization (based on Rick's postings to the tsql list). "The central goal of this document is to provide the temporal database community with a {\em comprehensive consensus benchmark} for temporal query languages which is {\em independent} of any existing language proposal. This is not a performance benchmark, but a {\em semantic} benchmark which is intended to be an aid in evaluating the user-friendliness of proposals for temporal query languages. Thus, temporal query languages should ideally be able to express the benchmark queries both conveniently and naturally." To me, the key word is user-friendliness. The benchmark is intended to be a valuable tool for designers of user-level query languages in general and of a TSQL language, in particular. The benchmark is not intended to cover algebras which are not user-level languages. Using the benchmark, a designer may become aware of boolean seams (Shashi has shown the importance of avoiding these), of violations of the 0-1-infinity principle, of lack of orthogonality, of other types of inconsistencies (see the PL literature), etc. >First, we feel that a certain classification of various queries has to >be established before the set of queries is proposed. As an *initial* >suggestion, we can classify temporal queries as follows. > > | HISTORICAL | ROLLBACK | BITEMPORAL >-------------------------------------------------------- >UNGROUPED | | | > | | | >GROUPED | | | > | | | >TEMP. AGGREGATES | | | > | | | >SCHEMA VERSIONING| | | > | | | >OTHER FEATURES | | | >-------------------------------------------------------- > >Then we can place one or several queries in each cell of this matrix. I think you are exactly right that a classification (or taxonomy) is needed before queries should be proposed. That was what I tried to indicate in the introductory statement of the schema proposal and in the plan that accompanied the schema proposal. "% The purpose of the following draft is then to define a taxonomy to % be used for categorizing the benchmark queries that will follow." "Three tasks must be accomplished initially. Task 1: Agree on a database schema. Task 2: Agree on an instance of the schema. Task 3: Agree on a suitable taxonomy for the benchmark queries. These tasks will be addressed sequentially during the next weeks. When they are completed, the benchmark will be populated with queries." I think your taxonomy is very helpful. It is very much a top-down taxonomy that makes it possible to classify a very wide variety of queries. In order for the current version of the benchmark to not become too big (I would not like it to exceed, say, 100 queries), Rick suggested a restricted scope. Thus, only valid time is addressed, and aggregates as well as schema versioning are presently not addressed. As a result, most queries in the initial benchmark will fall into only a couple of categories, making additional refinement desirable. I will request the help of you, Shashi, Ed, Patrick, Fabio, Maria, Paolo, and Abdullah (among others) when Task 3 is addressed. Grouping will certainly be an important aspect of this taxonomy and of the queries themselves, and I hope that you will be able to help ensure that it is present. >The second comment is that the development of the benchmark should not >be a substitute for a rigorous *theoretical* study of expressive >powers of various temporal query languages and algebras. It is not >entirely clear what the goal of the benchmark is. What seems >necessary is a kind of typography of temporal data models as suggested >by the above table and discussion. For example, one data model can be >grouped bitemporal, another be ungrouped historical with aggregates. I will emphasize in the introduction of the schema proposal document that the benchmark is not intended to be such a substitute. The current formulation is: "While the benchmark is not intended to constitute a metric for query language completeness, ..." I will change the formulation to this: "The benchmark is {\em not} intended to constitute a metric for query language completeness, and as such it is not a substitute for a rigorous {\em theoretical} study of expressive powers of various temporal query languages." Making a stronger and more explicit point out of this is very appropriate. I hoped that the introduction would have explained what was the goal of the benchmark. I will gratefully accept additional clarifications, to be added to the document. >Perhaps, the benchmark can be useful in developing such a typography. >Each type of data model should support a class of queries the model >embodies and should have its own standard of completeness. We believe >that this standard should be developed in the terms of an appropriate >logic (as in the classical relational case) rather than trying to >determine expressive power "by consensus" (we would not want to say >that one language is more expressive than another if it can express >95% of the benchmark queries and the other one only 87%). This is again an important point! I have seen papers argue that one algebra is better than another algebra because the former satisfies more criteria (from the Comp Surveys paper by Rick and Ed M) than the latter. That is very unfortunate. Similarly, we must try to avoid this use of the benchmark. For now, I'll add the following text to the introduction. This text may have to be refined later on. "It it emphasized that using the benchmark as an advanced, quantitative scoring system for comparing languages makes little sense. Thus, one language is not necesarily superior to another just because one is capable of expressing more benchmark queries than the other. Rather, the focus is on user-friendliness." Best regards, Christian